arXiv:1508.07503v2 [physics.soc-ph] 1 Dec 2015 


Two-state Markov-chain Poisson nature of 
individual cellphone call statistics 


Zhi-Qiang Jiang^, Wen-Jie Xie^, Ming-Xia Li^, Wei-Xing 
Zhou^, and Didier Sornette^’^ 

^ School of Business, Department of Mathematics, and Research Center for 
Econophysics, East China University of Science and Technology, Shanghai 200237, 
China 

^ Department of Management, Technology and Economics, ETH Zurich, 
Scheuchzerstrasse 7, CH-8092 Zurich, Switzerland 

^ Swiss Finance Institute, c/o University of Geneva, 40 blvd. Du Pont d’Arve, 
CH-1211 Geneva 4, Switzerland 

E-mail; wxzhou@ecust.edu.cn (Wei-Xing Zhou) and dsornette@ethz.ch 
(Didier Sornette) 


PACS numbers: 89.65.Ef, 89.75.Fb, 89.90.-fn 

Abstract. Humans are heterogenous and the behaviors of individuals could be 
different from that at the population level. Such differences could originate from 
(1) different mechanisms governing individual activities, (2) the same mechanism but 
with different scales, or (3) both of them. We conduct an in-depth study of the 
temporal patterns of cellphone conversation activities of 73,339 anonymous cellphone 
users with the same truncated Weibull distribution of inter-call durations. We find that 
the individual call events exhibit a pattern of bursts, in which high activity periods 
are alternated with low activity periods. Surprisingly, the number of events in high 
activity periods are found to conform to a power-law distribution at the population 
level, but follow an exponential distribution at the individual level, which is a hallmark 
of the absence of memory in individual call activities. Such exponential distribution is 
also observed for the number of events in low activity periods. Together with the 
exponential distributions of inter-call durations within bursts and of the intervals 
between consecutive bursts, we demonstrate that the individual call activities are 
driven by two independent Poisson processes, which can be combined within a minimal 
model in terms of a two-state first-order Markov chain giving very good agreement 
with the empirical distributions using the parameters estimated from real data for 
about half of the individuals in our sample. By measuring directly the distributions 
of call rates across the population, which exhibit power-law tails, we explain the 
difference with previous population level studies, purporting the existence of power-law 
distributions, via the “Superposition of Distributions” mechanism: The superposition 
of many exponential distributions of activities with a power-law distribution of their 
characteristic scales leads to a power-law distribution of the activities at the population 
level. Our results and model provide a simple universal description of the diversity of 
individual behaviors avoiding the caveat of model misidentification resulting from the 
use of population level data. Our findings shed light on the origins of bursty patterns 
in other human activities. 
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1. Introduction 

Human interactions are of particular importance in the dynamics of disease spreading 
and social contagions. Complex networks and human dynamics provide two different 
perspectives on the understanding of such interactions. Many appealing topological 
structures, differing from random graphs, have been uncovered in social interactive 
networks, such as small world [1], scale-free degree distribution [2], community 
weak tie [5], to list a few. The bursty characteristics of the temporal patterns are revealed 
in various types of human interacting activities from the perspective of human dynamics, 
such as email communications PE], short-message correspondences PEltto], cellphone 
conservations mm, letter correspondences [131 EH ES]) so on. Because static 
networks are limited to the description of sequences of instantaneously interacting links, 
temporal networks have received considerable research attention recently uni imiiH]. 
Indeed, temporal networks have the advantage that temporal patterns of interacting 
activities can be accounted for on each individual node. This suggests that we should 
pay more attention to the analysis of temporal patterns of human activities at the 
individual level rather than at the population level. 

Overwhelming evidence shows that human activities differ from Poisson processes 
and exhibit burst patterns in which the high activity periods are alternated with the 
low activity periods. Such burst patterns will give rise to fat tailed distributions of 
inter-event intervals. Several mechanisms have been proposed to explain such fat tailed 
distributions, including priority-queuing processes driven by human rational decision 
making 0 EH [2Q1 [21], Poisson processes modulated by circadian and weekly cycles 
u\m, adaptive interests [22], heterogeneity in the size distribution of social influences 
[23] . and so on. However, not all of the activities in our daily life can be interpreted 
by one of the proposed mechanism alone. For instance, the temporal patterns of short 
message communications are dominated by both Poissonian task initiations and decision 
making for task execution [9]. Moreover, the identihcation of the burst patterns plays 
a crucial role in understanding the memory behavior in human activities. Karsai et 
al. utilized an inter-event time threshold to locate the burst clusters and demonstrated 
that the deviation from an exponential distribution for the number of events in a burst 
cluster serves as an indicator of the memory behavior in the timing of events [121 EH ES] . 
By dehning that each event inside a burst, except the hrst and last ones, must have at 
least k neighbor events within At, [26] proposed a density-based algorithm to detect the 
burst clusters in the multi-event series. We also need to mention that, in the held of 
neuroscience, there are many methods to detect the burst patterns in the spike trains, 
such as Poisson Surprise method 1271 , Rank Surprise method [29l[2H], Robust Gaussian 
Surprise method [30], and so on. 

The enormous increase of popularity and the use of cellphones not only provides 
convenient high-efficiency real-time communication between people; it also facilitates 
information spreading and social contagion. One of the crucial ingredients needed 
to understand the dynamics of information diffusion is the set of temporal patterns 
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of individual cellphone conversation activities.As an illustration of the motivation for 
testing how population wide analyses inform on individual level studies and vice-versa, 
consider the hnancial and economic crisis of 2008, which led to strong criticisms of 
the economic profession for its use of the concept of the representative agent. While 
convenient mathematically, this assumption of homogeneisation is grossly insufficient 
to capture the heterogeneity of economic agent preferences, which is underpinning the 
complex nonlinear dynamics of the socioeconomic world. We believe that a number of 
high prohle analyses have sinned in the same direction by analyzing human activities at 
the population level and claiming the existence of non-Poisson behaviors with supposed 
long-memory and fat-tailed statistics at the individual level. 

In order to understand in greater depth the temporal patterns of cellphone 
conversation activities at the individual level and to decipher their immediate 
consequences for the simulation of individual call activities, we study the interacting 
activities of cellphone voice communications for each of 73,339 cellphone users. This 
research is a continuous work based on the results of ina. In that paper, an automatic 
htting technology is developed to hnd the distribution of the waiting time between 
consecutive outgoing calls of each individual in our selected sample, which is made of 
100,000 individual cellphone users each having at least 997 outgoing calls. It is found 
that there are 3,464 individuals with a power-law distribution of inter-call duration and 
73,339 individuals with a Weibull distribution of inter-call duration. The users with a 
power-law duration distribution exhibit extreme calling behaviors and can be classihed 
as robot-based callers, telecom fraud, and telephone sales. The users with a Weibull 
duration distribution correspond to the ordinary cell phone customers. Our analysis 
here is trying to understand the calling behaviors of the 73,339 individuals more deeply 
and hnd the key ingredients to model their individual call dynamics. We hnd that the 
calling events of the ordinary users can be separated into independent bursts. And we 
also hnd strong evidence that the bursts and the call events within bursts are dominated 
by two independent Poisson processes. We propose a minimal model by modulating the 
two Poisson processes by a Markov chain, which is in good agreement with our empirical 
results for about half of the users. 

Our results show that population level analyses are misleading, quite analogously to 
the misguided use of a representative agent in economics, as mentioned above. Mixing 
the dynamics of a heterogeneous population of agents gives the appearance of long- 
memory dynamics, which is just an artefact of the mixing process. In other words, 
people are diherent and cannot be represented by a single representative law. This is 
what we demonstrate in the present article. Our hnding underlines the need to fully 
account for heterogeneous preferences and behaviors in human population to understand 
and model adequately the burst patterns in individual activities. 
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2. Data Sets 

Our data include the call activities of 5,921,696 individuals in Shanghai. The records 
cover two separated periods, 28 June 2010 to 24 July 2010 and 1 October 2010 to 31 
December 2010. By excluding the days in which there are missing records, we have a 
total of 109 days. Each record entry includes the information on caller number, callee 
number, call starting time, and call length. The caller and callee number is encrypted in 
order to protect personal privacy. Note that the calls with zero durations are discarded . 
73,339 individuals in the top 100,000 most active individuals have been selected for our 
study based on the following criteria: these 73,339 chosen users have the same truncated 
Weibull distribution and exhibit normal behaviors of phone call activities [31] . It would 
be interesting to see whether the call activities of such phone users are governed by the 
same mechanism. 

We only take the outgoing calls to calculate the inter-call durations, because the 
incoming calls must be initiated by someone else and are byproducts of the outgoing 
calls. Note also that we are able to replicate the call events in the phone communication 
network only with respect to the process of individual outgoing calls. The key ingredients 
to model individual outgoing calls are: (1) when to initiate a call and (2) whom to call. 
Investigating the waiting time between consecutive outgoing calls is the hrst step towards 
understanding the hrst key ingredient when to initiate a call. We dehne the inter-call 
durations as follows. For a given series of outgoing call events {citf) : i = 1, • • • ,n}, 
where c{ti) is the event of the Tth out-going call and ti is its starting time, the inter-call 
duration is dehned as the time that elapses between two consecutive outgoing calls and 
it is calculated via di = t^ — tj_i. The discontinuous recording days and the inactive 
periods in the night result in very large inter-call durations. Therefore, we restrict the 
durations to a period of one day (the typical human circadian rhythm). Each day is 
partitioned at 4:00 A.M., at which the lowest call volume in a 24-hour period is observed. 
This also allows us to take into account the people who go out and stay awake later as 
well. Our consideration on the intraday inter-call durations are further supported by 
the following three arguments. (1) Human activities exhibit strong circadian rhythms 
[32] . On each day, people repeat their daily activities with strong regularity, typically 
going to work in the morning and coming back home in the evening. (2) 99.2% of the 
intercall durations from the last call before 4 AM to the hrst call after 4 AM are greater 
than 4 hours, which supports our conjecture that the intercall durations crossing the 
time point 4:00 AM is very large. Obviously, in more than 99% of the cases, this inactive 
period has no impact on separating the call events into burst sizes and non-burst sizes 
under the situation where we do not cut the call periods into daily pieces. (3) The likely 
reason for observing such large values of intercall durations is that most of the cellphone 
users are sleeping at that time. Apparently, the underlying dynamics here is different 
from that in the daytime. 
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3. Empirical Results 

We previously reported that the inter-call durations of each individual in our analyzing 
sample conform to the Weibull distribution |3T]. This implies that their call dynamics 
may be governed by the same mechanism. In order to uncover the underlying 
mechanism, we focus our efforts on the understanding of burst patterns in the individual 
calling process, which reveals the switching between the periods of high activity and low 
activity. Due to the limited usefulness of the Hurst exponent and of the autocorrelation 
function to describe dependence in such processes with a fat-tail inter-event distribution 
[33], we use the distribution of the number of events in high activity and low activity 
periods to indicate the presence of the correlated behaviors in the event series for each 
individual. Specihcally, a series of call events of one given individual is dehned as 
{c{ti) : i = where c(tj) is the event of the i-th out-going call and ti is its 

starting time. A high activity period is dehned as a cluster of consecutive calls delineated 
by the j-th and the k-th ones, Sh = {c(ti) ■l<j<i<k<n}, such that each such i-th 
call follows the pevious call within a short time interval At. In other words, all events i 
in that period must meet this requirement ti — ti-i < At. Note that At is a parameter 
of the empirical analysis. Figure [U shows the sequence of outgoing calls for one typical 
cellphone user on June 28, 2010. The red vertical lines correspond to the outgoing calls 
that fall within At = 500 seconds of previous calls. The number of events in each high 
activity period Ch is obtained by counting the total number of calls belonging to the 
same high activity period. Comparing with the burst size E dehned in Ref. na , we 
have Ch = E — 1. According to the dehnition [12], the burst size E should conform to 
the exponential distribution if the call events are independent and the deviation from 
an exponential distribution diagnoses the existence of a correlation between consecutive 
events. This conclusion remains valid for our Ch dehned in the present paper. 



t/hour 

Figure 1. (color online). Sequence of outgoing calls for one typical cellphone user on 
June 28, 2010. The red (respectively black) vertical lines correspond to the outgoing 
calls that follow their previous outgoing calls within a waiting time less (respectively 
greater) than At = 500 seconds. 

Complementarily, the low activity period is dehned as the set of events si = {c(ti) : 
l<j<i<k<n} in that period separated from their nearest previous call by a 
duration greater than At, which requires all the events in the same low activity period 
to obey ti — > At. These events are represented by black vertical lines in hgure [H 
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The number of events in each low activity period ei is obtained by counting the total 
number of events belonging to the same low activity period. Again, deviations of the 
distribution of ei from an exponential would imply the existence of a dependence in the 
timing of these events. 



Figure 2. (color online).Probability distribution of the number en of call events in high 
activity period at the population level. (A) Probability distribution of calculated 
with threshold At = 100, 300, 600, and 1000 seconds. (B) Probability distribution of 
Ch calculated with threshold At = q{d), in which q = 0.5, 1 and 3 and (d) is the mean 
inter-call duration. Note that, for each specific user, we use his/her mean inter-call 
duration to calculate At. (C) Probability distribution of the average inter-call duration 
(d) for the whole sample. 

Figure [2] A shows that the probability distributions of eh corresponding to different 
values of At for individuals in our sample all exhibit clean power-law behaviors with an 
exponent ~ 3.6 ± 0.4, which is in agreement with the power-law distribution of burst 
size E with an exponent around 4.1 shown in Fig. 2 A in Ref. na. Because the call 
frequency varies a lot for different individuals (see the distribution of individual mean 
inter-call duration in hgure|2]C), using the same value of At to calculate Ch for different 
users may give different value of Ch at different scales, which could be a possible source of 
the power-law distribution. In order to remove such influence, we express the threshold 
At in units of (d), At = q{d), where (d) is the average value of inter-call durations 
for each specihc cell phone user. The probability distributions of Ch corresponding to 
different values of q for all individuals in our sample are plotted in hgure|2]B. Again, 
we see very nice power-law behaviors with an exponent around 5.3 in the tail. Figure [2] 
C plots the probability distribution of the mean inter-call durations for the individuals 
in our sample. The mean duration spans a range from 50 to 10,000 seconds, which 
indicates a strong heterogeneous call frequency for the individuals in our sample. 

For all our different specihcations of the threshold At, we find that the power-law 
distributions of the number Ch of events in high activity are robust. Since our focus 
here is to understand the individual calling patterns, it would be interesting to check 
whether the distributions of eh for each individual exhibit the same pattern as that of 
the aggregated sample. We further scan the values q = 0.5, 1, and 2 to investigate the 
influence of At on the distributions of the number Ch of events of in high activity periods 
and the number e/ of events in low activity periods for each individual in our sample. 
Figure [3] shows the distributions of Ch and ei for three users. Surprisingly, we find that 
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Figure 3. (color online). Complementary cumulative distribution of the number of 
events Ch in high activity period (A-C) and the number of events e; in low activity 
period (D-E), for three individuals, 6493248, 674842, and 961831 (one column for each 
individual). Each marker corresponds to a different q, defined such that At = q{d), 
where (d) is the average value of individual inter-call durations and At is the threshold 
used to define high activity period and low activity period. 


both the number of events in high activity periods and low activity periods conform to an 
exponential distribution for different values of q. These results are in striking contrast to 
the power law distributions of eh in £gure[2]B at the population level. Such exponential 
distributions of Ch and ei at the individual level supports strongly the hypothesis of 
no temporal dependence between individual calls in both high activity periods and low 
activity periods. Our results also provide a caution that building individual model based 
on empirical results at the population level may be misleading. 

As illustrated in Figure [H given a threshold At, we have classified all outgoing call 
events into two groups, the events in high activity period (red lines) and the events in low 
activity period (black lines), according to the waiting time from their previous nearest 
neighbors. Using this classihcation, we dehne a bursty cluster as a set of successive 
calls, where the first one belongs to the low activity class (black line) and all the others 
are part of the high activity class (red lines). The cluster ends at the last call in the 
bursty class that is followed by an event in the low activity period. Such a cluster can 
be interpreted as a sequence of high activity calls somehow triggered or related to the 
hrst one that acts as an initiating event. The clusters found for the call sequence shown 
in hgure [U are depicted by shadow areas. Note that not all black lines are following by 
red lines, indicating that not all calls initiate other calls. 

For a given configuration of individual call bursts with a specific threshold At = 
q{d), we investigate the probability distributions of inter-call durations db within bursts 
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Figure 4. (color online). Complementary cumulative distributions of inter-call 
durations within bursts (A-C) and time intervals between consecutive bursts (D-E) 
for three individuals (one column for each individual). Each color corresponds to a 
different q, defined such that At = q{d), where {d) is the average value of individual 
inter-call durations and At is the threshold used to define high activity period and low 
activity period. 


and of the time intervals between consecntive bursts. Figure 0] shows these two 
complementary cumulative distributions for three individuals. One can observe that the 
distribution of d^ conforms approximately to a right-truncated exponential distribution 
for different values of q which, together with the exponential distribution of eh in high 
activity periods, provides strong supports to the hypothesis that the calls in burst 
clusters are generated by a Poisson process. And the distributions of the time intervals 
dn between consecutive bursts are close to left-truncated exponential distributions, the 
more so, the larger q is, which suggests that the bursts are also generated by a Poisson 
process. The Poisson initiations of bursts are also observed in individual short message 
communications [9]. 

Motivated by the observations in Figure IH we conjecture the existence of an optimal 
threshold Atopt = Qoptid), which can be determined as the value that minimizes the 
residuals of the hts of the empirical distributions by truncated exponential distributions. 
To determine Atopt, we perform a Maximum Likelihood Estimation (MLE) of the 
empirical distributions using the two following expressions: 

(i) the left-truncated exponential distribution of the time intervals dn between 
consecutive bursts is given by 

p{dn, Xn) = Xn exp(-A„d„)/ exp(-A„Af) ; (1) 

(ii) the right-truncated exponential distribution of the inter-call durations dj, within 
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bursts is 

p{db, Xb) = Xbdbexp{-Xbdb)/[l - exp(-AbAt)] . (2) 

We use the complementary cumulative distributions Cat derived from the probability 
distributions (([T]) and ([2])), to dehne the htting residual, as proposed in Ref. [9], between 
the empirical and theoretical complementary distribution functions: 


R = 


1 

i,nt i,emp 

2 

l 

i,nt i,emp 

n 

1 

i,nt i,emp 


c%7+c^ 

i,nt i,emp 

i 




m 


(3) 


where g and m are the number of observations in the sample of dn and db, which are 
generated by the threshold At. 

By varying q := At/{d) from 0.2 to 4 with a step of 0.01, we estimate the htting 
residuals R for each q and obtain the optimal threshold as Atopt = Qopt{d), with gopt 
being the value that is associated with the minimum htting residuals as illustrated 
in hgureO We hnd that the distributions of db and dn can be well described by truncated 
exponential distributions at the optimal Atopt- This conhrms that individual cellphone 
call sequences can be generated by two Poisson processes, in which one initiates call 
events, and the other represents the stochastically activated bursts. 

We separate the call events into diherent bursts according to the best gopt and 
perform four additional KS tests. The corresponding hypothesises are that (Hi) the 
number of events in the high activity periods conforms to an exponential distribution, 
{H 2 ) the number of events in the low activity periods obeys an exponential distribution, 
{H 3 ) the inter-call duration inside bursts follows to a right-truncated exponential 
distribution, and {Hf) the waiting time between consecutive bursts complies a left- 
truncated exponential distribution. At the signihcant level of 0.01, the hypothesis Hi 
and H 2 cannot be rejected for 66,918 out of 73,339 (about 91.2%) the individuals, which 
provides very strong evidence in favor of the Poisson nature of the call events in the 
high and low activity periods. Furthermore, it is found that the hypothesis H 3 and 
774 cannot be rejected for 27,387 individuals (about 37.3% of the whole population) 
at the signihcant level of 0.01, in which 26,631 individuals (about 36.3% of the whole 
population) also accept the hypothesis Hi and H 2 at the same level. Such results 
support the our conjecture that the call dynamics are dominated by two alternated 
Poisson processes for about 36.3% of the individuals in our sample. 


4. Model 

Our empirical results provide a very clear foundation for conceptualising our model, 
which is constructed on three key ingredients: (1) a Poisson process to trigger the 
calling burst clusters, (2) a Poisson process to activate the call events inside burst 
clusters, and (3) an alternation of the two Poisson processes. In some sense, our 
model is similar to the cascading nonhomogeneous Poisson model of individual email 
activities [7], which has two essential ingredients: (1) generating the bursts according 
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Figure 5. (color online). Results of the optimal threshold for three individuals, 
6493248, 674842, and 961831 (one row for each one). (A-C) Plots of the fitting residual 
(l3|) as a function of g = At/{d), where (d) is the average value of inter-call durations 
and At is the threshold used to define high activity period and low activity period. 
The marker x represents the value (^opt for which the fitting residual (l3|) attains its 
minimum i?min)- (D"E) Probability distributions of inter-call durations within bursts, 
using Atopt- (G-H) Probability distributions of the time intervals between consecutive 
bursts for Atopt- 


to the circadian and weekly cycles of human activity and (2) assigning the number of 
events into a burst based on a distribution. However, our model has the advantages 
of simplicity of its components, smaller computation power needs for calibration, and 
simple implementation in simulations. Our model also benehts from a strong empirical 
foundation concerning the Poisson nature in individual phone call activities. In addition 
to providing an important way for understanding how normal cell phone users activate 
their personal call events, our model also offers a wide range of applications for the 
simulation of phone call traffic and of social contagion within human communication 
networks. 

Based on the model conception, the individual call dynamics is modelled by 
combining the two Poisson processes within a two-state hrst-order Markov chain. One 
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Poisson process Vn with intensity A„ models the activation of bursts. The other Poisson 
process Vb with intensity Xb models the flow of call events within bursts. We dehne as 
the state in which calls are generated by the process Vn and Sb as the state in which calls 
are dominated by the process Vb- The two states swap from one to the other according 
to a transition (or conditional) probability pij, which can be estimated empirically, 

^ . I • P(* = Sn, j = Sb) ^ 

Pij = PU = Sb\t = Sn) = - p -^- • (4) 

P(i = Sn) 

Here, p{i = Sn) is the probability that a call is in state s„, which can be estimated as the 
ratio between the total number of calls in state Sn to the total number of calls. Then, 
p{i = Sn,j = Sb) is the joint probability for the occurrence of two consecutive calls that 
the hrst one is in state and the second one is in state Sb- 

To be able to generate synthetic call sequences, we add a minimum call duration 
tp that reflects the characteristic time of a human conversation. We simulate our model 
as follows: 

(a) Assuming that the initial state is s„, we generated a call according to the process 
Vn, in which the call is initiated with the probability of Xntp] 

(b) Once a call occurs, the state for generating the next call is determined according to 
the transition probability Pij. For example, if the current state is Sn, the next state 
will be Sb with the probability p{j = Sb\i = Sn), or will remain with probability 
p{j = Sn\i = Sn); 

(c) When a new state is determined, the new call is generated according to the Poisson 
process in that state. If the new state is the new call is initiated with probability 
of Xntp. If the new state is Sb, the new call is initiated with probability of Xbtp. 
When the new call is generated, we go back to step (b); 

(d) When the simulation time reaches our target, the program stops. 

We estimate the parameters tp, Xb, A„ and the transition probabilities from the 
empirical data for each cellphone users and inject them in the above model to simulate 
their specihc call dynamics. Figure [6] compares the obtained distributions of inter¬ 
call durations obtained by our simulations of three individuals with their empirical 
counterparts. We hnd an excellent agreement. We further use the log-residual between 
the two complementary cumulative distribution, formulated as ln(C'M/C'E), to quantify 
the agreement between the model and the real data. Figure [6] D plots the log-residual 
at each inter-call time d for three users. It is observed that our model exhibit a very 
nice agreement with the real data up to d < 10^. We further estimate the probability of 
log-residual conditioned on the inter-event time for all 73,339 users under consideration. 
This conditional probability is illustrated in hgure |6] E. We also plots the average log- 
residual of all users as a solid line. One can see that our model seems to overestimate the 
data when d > 10^. The possible reason for this could be that our optimized objective 
function used in the parameters estimation does not aim at minimizing the residuals 
of the cumulative distribution of the inter-call time between the simulating data and 
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real data, but aiming at minimizing the fitting residuals of two truncated exponential 
distributions (see Equation ([3])). 




d d 


Figure 6. (color online). Model comparisons. (A-C) Comparison of the distributions 
of inter-call waiting times in our simulations and in the real data for three individuals, 
6493248, 674842, and 961831. colorredThe p-value are shown under the individual 
ID. (D) Plot of the log-residual between the model data and the empirical data 
(ln(C'M/C'E)) with respect to the inter-call duration d for three users. (E) Plots of 
the average log-residual at each inter-call time for all 7,3339 users. The color bar 
represents the probability of log-residual conditioned on the inter-call time. All the 
model data are averaged over 20 simulations. 


5. Statistical tests 

If our model can explain the daily individual call activities, at least, the intercall 
durations from the model should have the same distribution as the empirical data. 
We perform a strict statistical test to check the agreement of the intercall duration 
distributions between the empirical data and our proposed model for each mobile phone 
user. Rather than using the Kolmogorov-Smirnov test, which is a point test strongly 
influenced by the central part of the distribution [34], we adopt an integral-based test 
of the goodness of £t of the empirical distributions by our two-state hrst-order Markov 
chain combining the two Poisson processes, dehned in terms of the statistics, 

^ = / \Ce{x) - CM{x)\dx , 

J X 


( 5 ) 
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in which x = In d. This statistic A quantihes the distance between empirical and model 
distributions as given by the (absolute value of the) area between the two complementary 
cumulative distributions [7j. 

The null hypothesis of the Monte Carlo test is that the intercall durations of the 
real individual call events and the modelling call events, which is generated by the two- 
state hrst-order Markov-chain combining the two Poisson processes, are drawn from the 
same distribution. The rejection of the null hypothesis implies the failure of our model 
in the explanation of the individual call activities. If the null hypothesis cannot be 
rejected, this equals to being against the alternative hypothesis, that both samples of 
intercall durations do not come from the same distribution, and giving signals that our 
model can explain the individual call activities at least in the aspect of the duration 
distribution. Such kind of decision tests, including the KS test and the CvM test, is 
widely used to determine that one sample of data come from a reference distribution 
or two samples of data are drawn from the same distribution. Our model is rejected if 
the absolute difference between A and is greater than the critical value associating 
with the probability 1 — a/2. In other words, we can reject the null hypothesis if the 
p-value is less than a. The p-value is dehned by Pr(|y4s — (As)| > \A— (^s)|), where As 
is the same as A but obtained from the analysis on the model data, which is generated 
by our proposed model with the best parameters estimated from the real data. 

The procedure to calculate the p-value of this statistic A is proceeded as follows: 

(a) Generation of the model data with the best parameters estimated from real data; 

(b) Calculation of the statistic A between the obtained model distribution and its 
empirical counterpart; (c) Regarding the model data as playing the same role as real 
data, generation of synthetic data with the best parameters estimated from the model 
data; (d) Calculation of the statistic Ag between the distribution obtained from the 
model and its synthetic replicas. One thousand replicas are generated; and (e) This 
allows to evaluate a two-tailed p-value, dehned as Pr(|y4s — (As)| > 1^4 — (As)|), where 
(As) is the mean value of the synthetic area statistics. 

The probability distribution of p-values is illustrated in £gure[7]A. At the signihcant 
level a = 0.01, we hnd that 35’703 individuals have a p-value larger than 0.01. This 
implies that, for this population, the null hypothesis that our model is the generating 
process of the call sequences can be rejected at the 99% confidence level. 

In hgure 0 B-F, we compare the distributions of the calibrated model parameters 
tp, Xf, and A„ and of the transition probabilities, for the 35’703 individuals who pass 
the p-value test and the other 37’636 individuals who do not. The two distributions 
of tp, \b and \n are statistically identical for the two groups of individuals. This 
supports the two-state Markov-chain Poisson model of the inter-call durations at the 
individual level. The broad distributions of parameters highlight the heterogeneity 
in the individual calling behaviors and support our message about the importance of 
calibrating at the individual rather than at the population level. Specihcally, one can 
observe that the tails of the distributions of \b and A„ are approximately power laws 
(linear in the log-log representation of panels C and D of Figure!^. The least-square 
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Figure 7. (color online). Results of statistical tests. (A) Cumulative distribution of 
p-value. (B-F) Comparison of the distributions of model parameters and transition 
probabilities for those individuals who pass the p-value test and the others who do not 
pass. Panels B-F correspond to the distribution of tp, Xb, A„, p{j = = Sb), and 

P{j = Sb\i = Sn), respectively. 


method gives the power-law exponents 4.7 for Xj, and 7.6 for A„. This means that the 
rate of activity of each individual is well-approximated by an exponential distribution 
but the distribution of the exponential rates across the population is itself a power 
law. Another intriguing observation is that the power-law exponent of Xb is close to the 
power-law exponent 5.6 of in hgure [2] B. This can be explained by the mechanism 
of “Superposition of Distributions” described in chapter 14.4.1 of Ref. [35], one obtains 
power law distributions of activities simply by the superposition of many exponential 
distributions with power law distributions of their characteristic scales. Therefore, we 
are led to suggest that the power law distribution of the number of events in bursty 
period at the population level in hgure |2B is simply the result of this mechanism of a 
“Superposition of Distributions”. Our hndings thus provide an alternative explanation 
of the population-level results, which is different from Karsai et ah [Q |2l] in which the 
advanced explanations in terms of long memory behavior and reinforcement process is 
proposed at the individual level. 

Panels E and F of hgure 0 identify signihcant diherences between the distributions 
of transition probabilities. Individuals who pass the test have a relatively smaller 
probability p{j = Sn\i = Sb) to transit from the bursty state to the non-bursty state and 
a relatively large transition probability p{j = Sb|i = to cross from the non-bursty 
state to the bursty state. 
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6. Discussion 

Our results on the statistics of calls differ from that concerning short text messages. In 
Ref. |9], it was found that the initiation of bursts in short messages is Poisson, as for the 
calls found here. However, the inter-message durations within bursts are distributed as 
a power law, while we reported an exponential distribution of inter-call durations within 
bursts. The likely reason is that text messages often follow a “sending-response” pattern 
while calls are self-contained so that a caller is answered usually in the same call to her 
query. In contrast, the “sending-response” nature of many text messages introduces 
an inherent delay that is in general broadly distributed as a result of priority-queuing 
processes [6l[l9l[2ni[2l], circadian and weekly cycles mm, adaptive interests [22], and 
so on. 

In summary, we have found that individual call activities can be separated into 
independent bursts, and that the bursts and the call events within bursts are driven by 
two independent Poisson processes. We found the existence of a memoryless behavior 
of individual call activities, which are diagnosed by the exponential distribution of the 
number of call events in both high activity and low activity periods. We also documented 
exponential distributions of the time intervals between consecutive bursts and of the 
inter-call duration within bursts. We have proposed a minimal model to replicate the 
call activities for each individual, based on three ingredients: Poisson initiation of bursts, 
Poisson activation of calls within bursts and Markov-chain state shift between the two 
Poisson processes. We have integrated these three ingredients in a two-state hrst-order 
Markov-chain combining the two Poisson processes. By using the parameters estimated 
from real data, our model is able to reproduce the empirical results very well. The 
Monte Carlo tests conhrm that the validity of our model, which cannot be rejected at the 
signihcant level of 0.01 for 48.68% of the total population of 73’339 cellphone users that 
we have analyzed. Our model is able to account for the diversity of individual behaviors 
within a simple universal structure. It avoids the caveat of model misidentihcation 
resulting from the use of population level data. 
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